Media Search Processing Using Partial Schemas

ABSTRACT

A process generates searchable content for visual media files. The process uses a set of schemas, including a source schema and a keyword schema. The process uses workers, each specifying its input schemas and its output schemas. A dependency graph includes a node for each worker, with dependencies based on the input and output schemas. The process constructs a source schema instance for a selected visual media file, and the process traverses nodes in the graph beginning with an initial worker process according to the media type. One or more worker processes insert search terms into the keyword schema instance. The process stores the keyword schema instance in a database for subsequent media queries.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.15/697,336, filed Sep. 6, 2017, entitled “Media Search Processing UsingPartial Schemas,” which claims priority to U.S. Provisional ApplicationSer. No. 62/384,145, filed Sep. 6, 2016, entitled “Media SearchProcessing Using Partial Schemas,” each of which is incorporated byreference herein in its entirety.

This application is related to U.S. patent application Ser. No.14/941,502, filed Nov. 13, 2015, entitled “Systems and Methods ofBuilding and Using an Image Catalog,” now U.S. Pat. No. 10,318,575,which is incorporated herein by reference in its entirety.

TECHNICAL FIELD

The disclosed implementations relate generally to searching a documentrepository and more specifically to a processing methodology forconstructing searchable content for visual media files.

BACKGROUND

Collections of visual media files (e.g., images and video) are growingin size and are often in multiple locations. Media repositories mayexist on local storage for mobile and desktop devices, dedicatednetwork-attached storage (NAS), or on remote cloud services. It isparticularly difficult to search media files. Whereas textual queriescan be matched to text content of ordinary documents, an image or videodoes not include text that can be directly matched. In addition, becauseof the vast quantity of media files, a manual scan of the media fileuniverse is generally not productive. Furthermore, brute forceapproaches, such as performing OCR on an entire image, does notnecessarily capture critical characteristics that would be relevant to asearch query.

SUMMARY

Disclosed implementations address the above deficiencies and otherproblems associated with managing media files. The present disclosure isdirected towards processes that provide visual insight, discovery, andnavigation into collections of millions of media files. A user cansearch across an entire portfolio using textual queries, which arematched against semantic information extracted from the media files.

Disclosed implementations generate searchable content using groups ofinterrelated worker processes, which can be customized for particularscenarios. For example, the worker processes applied to a set oflandscape images may be quite different from the worker processesapplied to an animated movie. Each worker process specifies a set ofpartial schemas that it needs as input and specifies a set of partialschemas that it creates. Each partial schema contains a specific groupof data fields, each with a specified data type. Each partial schemainstance includes data for a specific media file. In some cases, not allof the data fields have data for every media file. The input and outputschemas for each worker process impose a partial ordering on the workerprocesses. One of the output schema instances includes a set of keywordsfor the processed media file. One partial schema that is used at theoutset of the process is a source schema, which includes basicinformation about the source file being processed.

The source and keywords schemas are just two of many partial schemasprovided in media processing implementations. In addition, users cancreate new worker processes and new partial schemas, and define whichpartial schemas each worker process creates or uses. Someimplementations enable users to extend existing schemas (e.g., addingadditional data fields). Some implementations provide the extensibilitythrough an SDK for developers.

Each partial schema is roughly “a set of named and typed data fieldsproviding a logical grouping of a semantic concept.” These partialschemas provide the formal inputs and outputs to each processing node.Some of the partial schemas are internal to a processing network. Theseare used to coordinate processing among a set of nodes. Defining aschema allows nodes and clients of the framework to be developedindependently, which facilitates modular development and scaling. Someof the partial schemas are defined by the inputs to the system (e.g.,images, videos, and PDFs) and are stored as the outputs in a database(e.g., keywords, automatically computed document categories, Booleanvalues determined through vision analysis, and so on).

For example, some implementations define an image schema to include: awidth, a height, a color type, and a precision. Processing nodes thatwork with images can use this definition to perform their work. Theworker processes for the nodes can be developed independently, and canrely on this definition to coordinate their work. Similarly, clientapplications can be written that rely on the aspect ratio to display theimage.

A more traditional database has a single monolithic schema. In contrast,implementations here utilize a flexible and extensible collection ofpartial schemas that can be combined differently for each media filecollection. This allows considerable reuse of processing components, andenables third parties to develop their own processing nodes for theirclients that interoperate with the rest of the platform.

On the other end of the spectrum, a no-SQL database has no schema atall, just a flat set of named fields. In this “Wild West” environment, adeveloper can do anything, but such a system does not scale or provide afoundation for modular development.

In some implementations, some of the worker processes apply computervision algorithms to media files (e.g., images) in order to extractmetadata. The computer vision algorithms include: deep convolutionalneural networks to extract keywords; optical character recognition toextract text (e.g., jersey numbers, signs, and logos); facialrecognition to match faces to names; color analysis; and structuralanalysis (e.g., using SIFT). In addition, some worker processes extractexisting metadata for each media file, such as its origin, creationdate, author, location, camera type, and statistical information.

The partial schemas enable modular development because each workerprocess defines which schemas it needs and which schemas it creates. Inaddition, by saving the partial schemas, some implementations enableefficient reprocessing. For example, one worker process (of many) may bemodified without changing the others. The modified worker process canbegin by using the saved schemas that it needs, and only subsequentworker processes that rely on the output of the modified worker process(either directly or indirectly) need to be reprocessed.

In accordance with some implementations, a method generates searchablecontent for visual media files. The method is performed at a computingsystem having one or more processors and memory. The method defines aset of schemas. The schemas are sometimes referred to as “partialschemas” and each schema includes a respective plurality of related datafields, each having a specified data type. The set of schemas includes asource schema, which includes basic information about a source mediafile, and a keyword schema, which is filled in during processing toinclude keywords relevant to the media file. The set of schemastypically includes many partial schemas in addition to the source andkeyword schemas, as illustrated below in FIGS. 6A-6H.

The method defines worker processes, where each worker processdefinition specifies a respective set of one or more input schemas fromthe defined set of schemas and each worker process definition specifiesa respective set of one or more output schemas from the defined set ofschemas. The method builds a dependency graph (also called a processflow graph) that includes a node for each worker process, withdependencies based on the input schemas and output schemas for eachworker process. The dependency graph includes multiple initial workerprocesses, and each initial worker process corresponds to a distinctmedia type. The respective set of input schemas for each initial workerprocess consists of the source schema.

The method receives selection of a plurality of visual media files andconstructs a respective source schema instance for each of the selectedvisual media files, filling in fields in the source schema instanceusing information about the respective visual media file. For eachselected visual media file, the method traverses nodes in the dependencygraph beginning with a respective initial worker process correspondingto a media type of the visual media file, thereby executing a pluralityof worker processes, which construct a plurality of additional distinctschema instances. One or more of the worker processes executed duringthe traversal inserts search terms into a respective keyword schemainstance. The method stores data from the keyword schema instance and alink to the corresponding visual media file in a database for subsequentsearching of visual media files.

In some implementations, partial schemas provide a way of communicatingdata between the nodes in the graph. Some of the partial schemas areused during processing and discarded, but other schemas are stored in adatabase (e.g., for subsequent searching and/or reprocessing). Forexample, one node may compute boxes that surround regions that mayinclude text. Data for these boxes is placed in a partial schema forsubsequent worker processes that perform OCR on the content of theboxes. Although these boxes do not include keywords, someimplementations saved the partial schemas for the boxes forreprocessing. In some implementations, the box information is discardedafter processing is complete. Similarly, the OCR text from a processingnode may be stored permanently in the database, or stored onlytemporarily in a partial schema, enabling other worker processes toanalyze the OCR text (e.g., another worker process may identify keywordsin the scanned text). In this example, the partial schema with thekeywords is stored (for subsequent searching), but the partial schemasfor the boxes and OCR text may be saved or discarded depending on theimplementation (e.g., based on complexity or usefulness forreprocessing).

In some implementations, the method receives a search query from a user,where the search query includes multiple textual terms. The methodmatches the received search query to one or more keyword schemainstances. The method then returns, to the user, search results thatidentify visual media files corresponding to the matched keyword schemainstances.

In some implementations, the method stores additional schema instancesin the database. In some implementations, the method stores data for allof the schema instances that are created during traversal of the graph.In some implementations, the method stores data for a plurality of theadditional schema instances. In some implementations, a user candesignate which of the schema instances are stored.

In some implementations, the method receives a search query from a user,where the search query includes one or more textual terms. The methodmatches the received search query to one or more of the stored schemainstances. The method then returns search results to the user. Thesearch results identify visual media files corresponding to the matchedschema instances.

In some implementations, the method includes a recursive loop, whichextracts embedded media files from an existing file, and adds theextracted files to the set of media files for processing. For example,while processing a PDF file or other multipage document, a workerprocess may identify embedded image or video files. In someimplementations, traversing nodes in the dependency graph for a firstvisual media file includes executing a first worker process thatextracts one or more additional visual media files from within the firstvisual media file and adds the additional visual media files to theselected visual media files.

In some implementations, a worker process extracts full pages fromwithin a PDF or other multipage document, converts each page to animage, and submits them through an image processing pipeline foranalysis. This can be particularly useful for scanned documents. Thedisclosed processes can identify both text and embedded images, andcreate searchable text for the scanned pages.

In some instances, the media files include one or more image files(e.g., JPEG, PNG, or TIFF), one or more video files (MP4, MOV, or AVI),and/or one or more multipage documents (such as PDF documents or otherdocuments that contain embedded images or video).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a context in which some implementationsoperate.

FIG. 2 is a block diagram of a client device in accordance with someimplementations.

FIG. 3 is a block diagram of a server in accordance with someimplementations.

FIG. 4 provides is a skeletal data structure for storing a media catalogin accordance with some implementations.

FIG. 5A shows a process flow graph used in the process for generatingsearchable content in accordance with some implementations.

FIG. 5B provides a skeletal data structure for the nodes in the dataflow graph of FIG. 5A, in accordance with some implementations.

FIG. 5C provides a process flow for generating searchable content formedia files, in accordance with some implementations.

FIG. 5D is a user interface window displayed while importing media filesinto a media catalog, in accordance with some implementations.

FIGS. 6A-6G are skeletal partial schemas that are used for generatingsearchable content, in accordance with some implementations.

FIG. 6H illustrates a custom partial schema in accordance with someimplementations.

FIG. 7 provides a screen shot of a media application in accordance withsome implementations.

Like reference numerals refer to corresponding parts throughout thedrawings.

DESCRIPTION OF IMPLEMENTATIONS

Reference will now be made to various implementations, examples of whichare illustrated in the accompanying drawings. In the following detaileddescription, numerous specific details are set forth in order to providea thorough understanding of the invention and the describedimplementations. However, the invention may be practiced without thesespecific details. In other instances, well-known methods, procedures,components, and circuits have not been described in detail so as not tounnecessarily obscure aspects of the implementations.

FIG. 1 illustrates a context in which some implementations operate. Amedia file repository 102 stores images 104, videos 106, and/ormultimedia documents (e.g., PDF) 108. In some implementations, there aretwo or more media file repositories 102. A typical media file repository102 may store millions of media files or more. In some implementations,the media files include images (e.g., JPEG, TIFF, PNG, GIF, BMP, CGM, orSVG). In some implementations, the media files include videos or soundrecordings. In some implementations, all of the media files in therepository 102 have the same type, but some repositories 102 include aheterogeneous collection of multimedia files.

In the illustrated implementation, there is a server system 116, whichincludes one or more servers 300. In some implementations, the serversystem 116 consists of a single server 300. More commonly, the serversystem 116 includes a plurality of servers 300 (e.g., 20, 50, 100, ormore). In some implementations, the servers 300 are connected by aninternal communication network or bus 130. The server system 116includes one or more web servers 118, which receive requests from users(e.g., from a client device 110) and return appropriate information,resources, links, and so on. In some implementations, the server system116 includes one or more application servers 120, which provide variousapplications, such as a media application 112. The server system 116typically includes one or more databases 122, which store informationsuch as web pages, a user list, and various user information (e.g., usernames and encrypted passwords, user preferences, and so on). Thedatabase here stores a process flow graph 124, as described below withrespect to FIGS. 5A and 5B. The database also stores a media catalog126, which includes information about media files that have beenimported. The media catalog 126 is described in more detail below withrespect to FIG. 4. The media catalog 126 stores data about each importedmedia file, including a set of keywords 128. Typically, the keywords arepopulated during import, using the techniques described in the presentapplication.

The server system 116 also includes a media processing engine 132, whichis sometimes referred to as an import engine. Note that the mediaprocessing engine 132 is not limited to the import process. For example,a user may create additional processing logic after media files arealready imported. The media processing engine 132 can be reapplied,using the updated logic, to generate updated search terms for mediafiles that are already in the media catalog 126. The media processingengine 132 uses multiple worker process 134-1, 134-2, 134-3, . . . toanalyze each media file and generate the searchable content. Asillustrated below in FIGS. 5A and 5B, each worker process corresponds toa node in the process flow graph 124. In some implementations, eachworker process 134 corresponds to a unique object class or executableprogram.

The media file repositories 102, client devices 110, and the serversystem 116 are connected by one or more networks 114, such as theInternet and one or more local area networks.

In some implementations, some of the functionality described withrespect to the server system 116 is performed by a client device 110.

FIG. 2 is a block diagram illustrating a client device 110 that a useruses to access a media application 112. A client device is also referredto as a computing device, which may be a tablet computer, a laptopcomputer, a smart phone, a desktop computer, a PDA, or other computingdevice than can run the media application 112 and has access to acommunication network 114. A client device 110 typically includes one ormore processing units (CPUs) 202 for executing modules, programs, orinstructions stored in the memory 214 and thereby performing processingoperations; one or more network or other communications interfaces 204;memory 214; and one or more communication buses 212 for interconnectingthese components. The communication buses 212 may include circuitry(sometimes called a chipset) that interconnects and controlscommunications between system components. A client device 110 includes auser interface 206 comprising a display device 208 and one or more inputdevices or mechanisms 210. In some implementations, the inputdevice/mechanism includes a keyboard and a mouse; in someimplementations, the input device/mechanism includes a “soft” keyboard,which is displayed as needed on the display device 208, enabling a userto “press keys” that appear on the display 208.

In some implementations, the memory 214 includes high-speed randomaccess memory, such as DRAM, SRAM, DDR RAM or other random access solidstate memory devices. In some implementations, the memory 214 includesnon-volatile memory, such as one or more magnetic disk storage devices,optical disk storage devices, flash memory devices, or othernon-volatile solid state storage devices. In some implementations, thememory 214 includes one or more storage devices remotely located fromthe CPU(s) 202. The memory 214, or alternately the non-volatile memorydevice(s) within the memory 214, comprises a non-transitory computerreadable storage medium. In some implementations, the memory 214, or thecomputer readable storage medium of the memory 214, stores the followingprograms, modules, and data structures, or a subset thereof:

-   -   an operating system 216, which includes procedures for handling        various basic system services and for performing hardware        dependent tasks;    -   a communications module 218, which is used for connecting the        client device 110 to other computers and devices via the one or        more communication network interfaces 204 (wired or wireless)        and one or more communication networks 114, such as the        Internet, other wide area networks, local area networks,        metropolitan area networks, and so on;    -   a display module 220, which receives input from the one or more        input devices 210, and generates user interface elements for        display on the display device 208;    -   a web browser 222, which enables a user to communicate over a        network 114 (such as the Internet) with remote computers or        devices;    -   a media application 112, which enables a user to search and        retrieve documents from one or more remote document repositories        102 or local document repository 240. The media application 112        provides a user interface 224, as illustrated below by the        screenshot in FIG. 7. The media application 112 also includes a        retrieval module 226, which retrieves media files (or        thumbnails) corresponding to a search query or search folder;    -   application data 230, which includes a set of search results        236, and may include thumbnail images 238 for each one of the        identified media files in the search results. In some instances,        the user retrieves one or more full media files 232 based on the        search results 236; and    -   in some implementations, the memory stores a local media file        repository 240, such as a personal photo album or artwork        portfolio.

Each of the above identified executable modules, applications, or setsof procedures may be stored in one or more of the previously mentionedmemory devices and corresponds to a set of instructions for performing afunction described above. The above identified modules or programs(i.e., sets of instructions) need not be implemented as separatesoftware programs, procedures, or modules, and thus various subsets ofthese modules may be combined or otherwise re-arranged in variousimplementations. In some implementations, the memory 214 stores a subsetof the modules and data structures identified above. Furthermore, thememory 214 may store additional modules or data structures not describedabove.

Although FIG. 2 shows a client device 110, FIG. 2 is intended more as afunctional description of the various features that may be presentrather than as a structural schematic of the implementations describedherein. In practice, and as recognized by those of ordinary skill in theart, items shown separately could be combined and some items could beseparated.

FIG. 3 is a block diagram illustrating a server 300. In someimplementations, a server 300 is one of a plurality of servers in aserver system 116. A server 300 typically includes one or moreprocessing units (CPUs) 302 for executing modules, programs, orinstructions stored in the memory 314 and thereby performing processingoperations; one or more network or other communications interfaces 304;memory 314; and one or more communication buses 312 for interconnectingthese components. The communication buses 312 may include circuitry(sometimes called a chipset) that interconnects and controlscommunications between system components. In some implementations, aserver 300 includes a user interface 306, which may include a displaydevice 308 and one or more input devices 310, such as a keyboard and amouse.

In some implementations, the memory 314 includes high-speed randomaccess memory, such as DRAM, SRAM, DDR RAM or other random access solidstate memory devices. In some implementations, the memory 314 includesnon-volatile memory, such as one or more magnetic disk storage devices,optical disk storage devices, flash memory devices, or othernon-volatile solid state storage devices. In some implementations, thememory 314 includes one or more storage devices remotely located fromthe CPU(s) 302. The memory 314, or alternately the non-volatile memorydevice(s) within the memory 314, comprises a non-transitory computerreadable storage medium. In some implementations, the memory 314, or thecomputer readable storage medium of the memory 314, stores the followingprograms, modules, and data structures, or a subset thereof:

-   -   an operating system 316, which includes procedures for handling        various basic system services and for performing hardware        dependent tasks;    -   a communications module 318, which is used for connecting the        server 300 to other computers via the one or more communication        network interfaces 304 (wired or wireless) and one or more        communication networks 114, such as the Internet, other wide        area networks, local area networks, metropolitan area networks,        and so on;    -   a display module 320, which receives input from one or more        input devices 310, and generates user interface elements for        display on a display device 308;    -   one or more web servers 118, which receive requests from a        client device 110, and return responsive web pages, resources,        or links. In some implementations, each request is logged in the        database 122;    -   one or more application servers 120, which provide various        applications (such as a media application 112) to the client        devices 110. In some instances, applications are provided as a        set of web pages, which are delivered to the client devices 110        and displayed in a web browser 222. The web pages are delivered        as needed or requested. In some instances, an application is        delivered to a client device 110 as a download, which is        installed and run from the client device 110 outside of a web        browser 222;    -   in some implementations, the application server provides a        retrieval module 226 as part of the media application 112. In        other implementations, the retrieval module 226 is a separate        application provided by the application server 120. The        retrieval module retrieves media files (or thumbnails)        corresponding to a search query or search folder;    -   some implementations include a user interface engine 326, which        provides the user interface 224 for users of the media        application 112;    -   a query engine 330, which is used to identify media files        corresponding to a user's textual search queries, and return        responsive search results;    -   an import engine (also known as a media processing engine) 132,        which processes media files to generate searchable content, and        described in more detail below with respect to FIGS. 5A-5D. The        import engine uses a plurality of worker processes 134-1, 134-2,        . . . to generate the searchable content. Each of the worker        processes 134 corresponds to a node in the process flow graph        124;    -   one or more databases 122, which store various data used by the        modules or programs identified above. In some implementations,        the database 122 includes a list of authorized users 336, which        may include user names, encrypted passwords, and other relevant        information about each user. In some implementations, the        database 122 also stores search folder definitions 338, which        specify what media files are associated with user-created        folders;    -   the database 122 also stores a media catalog 126, which        identifies a list of media files that have been imported. Each        media file in the catalog has an associated media file id 340        (e.g., a globally unique identifier), and a media file reference        342, which is a link or address of the media file (e.g., a URL        or network address). Note that implementations typically do not        save new copies of the media file during the import process, so        the media files remain in their original locations. The data for        each media file also includes various metadata 344 (e.g.,        author, creation timestamp, creation location, and so on). When        the media processing engine 132 runs (or reruns), the process        creates or updates the set of keywords 128 for the media file.        In some implementations, the media catalog 126 stores one or        more thumbnails 346 for each media file. In some        implementations, the media catalog 126 also stores that partial        schemas 348 that are generated during the import processing. In        some implementations, only a specified subset of the partial        schemas are saved. The saved partial schemas may be used in        subsequent searching of the media catalog;    -   the database also stores the process flow graph 124, which is        used by the worker processes 134. In particular, the process        flow graph includes the nodes 500, as illustrated below in FIGS.        5A and 5B, as well as the dependencies between the nodes; and    -   zero or more media file repositories 102, which contain the        actual media files (e.g., images and videos).

Each of the above identified elements in FIG. 3 may be stored in one ormore of the previously mentioned memory devices. Each executableprogram, module, or procedure corresponds to a set of instructions forperforming a function described above. The above identified modules orprograms (i.e., sets of instructions) need not be implemented asseparate software programs, procedures or modules, and thus varioussubsets of these modules may be combined or otherwise re-arranged invarious implementations. In some implementations, the memory 314 storesa subset of the modules and data structures identified above.Furthermore, the memory 314 may store additional modules or datastructures not described above.

Although FIG. 3 illustrates a server 300, FIG. 3 is intended more as afunctional illustration of the various features that may be present in aset of one or more servers (the server system 116) rather than as astructural schematic of the implementations described herein. Inpractice, and as recognized by those of ordinary skill in the art, itemsshown separately could be combined and some items could be separated.The actual number of servers used to implement these features, and howfeatures are allocated among them, will vary from one implementation toanother, and may depend in part on the amount of data traffic that thesystem must handle during peak usage periods as well as during averageusage periods.

As illustrated in FIGS. 2 and 3, the functionality for a mediaapplication may be shared between a client device 110 and a serversystem 116. In other implementations, the majority of the processing anddata storage occurs at the server system 116, and the client device 110uses a web browser 222 to view and interact with the data. One of skillin the art recognizes that various allocations of functionality betweenthe client device 110 and the server system 116 are possible, and someimplementations support multiple configurations (e.g., based on userselection).

FIG. 4 shows a skeletal media catalog 126. Each record in the mediacatalog 126 identifies a media file in one of the media filerepositories 102. Each media file is uniquely identified by a media fileID 340, and includes a media file reference 342 to identify the sourcelocation of the document. For example, the media file reference mayspecify a full path name, including server, volume, path, and file namefor a document stored on a local area network, or a URL with a file namefor documents retrieved over the Internet. Some implementations store amedia file type 402 for each media file. In some implementations, themedia file type 402 corresponds to the file name extension of the mediafile, such as “PDF”, “JPEG”, “TIFF”, “PNG”, “BMP”, “TXT”, “MP4”, and soon. In some implementations, the media file type specifies a generalcategory for each document, such as “VIDEO”, “IMAGE”, or “DOCUMENT”.

In some implementations, the media catalog 126 includes a list ofkeywords 128 for each document. In some implementations, the keywordsare indexed.

In some instances, location information is available for the documents,which identifies where the document was created. For example, when thedocuments are images, GPS coordinates may be available for some of theimages, and these coordinates are stored as a location 404 for the mediafile.

In some implementations, other metadata 344 is stored for each document,such as an author 406 and/or a creation datetime 408, or additionalmetadata 410.

In some implementations, the media catalog 126 also includes one or morethumbnail images or document summaries 346. For images, this istypically a small low-resolution copy of the image that can be used forreviewing many images at the same time. For textual documents, someimplementations generate a summary or abstract of the document, such asa title and some key sentences. For videos, a thumbnail image may be alow resolution image of one or more video frames.

The media catalog 126 is typically populated by the import engine 132during an import process. The user specifies various parameters for animport operation, such as a location of the repository, a directory offiles in the repository, an optional filter of which documents toselect, and so on. In some instances, the user specifies which customfields to populate during the import process. Some of the techniquesused for extracting information during the import process are describedin application Ser. No. 14/941,502, filed Nov. 13, 2015, entitled“Systems and Methods of Building and Using an Image Catalog,” which isincorporated herein by reference in its entirety.

FIG. 5A provides a process flow graph 124, which is used by the mediaprocessing engine 132 to generate searchable content from media files.Each of the nodes 500 in the graph 124 corresponds to a specific workerprocess 518, as illustrated in FIG. 5B. In the illustrated graph 124,there are three initial nodes 500-1, 500-2, and 500-3. Each initial nodecorresponds to a unique media type 402. For example, the first initialnode 500-1 may correspond to image files, the second initial node 500-2may correspond to video files, and the third initial node 500-3 maycorrespond to PDF files. In some implementations, the media file types402 are more granular. For example, there may be a separate initial nodefor each image type or sub-group of image types (e.g., separate initialnodes for JPEG versus PNG files). Initial nodes rely only on a sourcepartial schema 600, which is illustrated below in FIG. 6A. In the nodedata structure, implementations may designate whether each node is aninitial node using the field is_initial_node 512. For each initial node,the node also specifies the media_file_type 402 (which may be empty ornull for non-initial nodes).

In some implementations, each node has a unique node_id 510, which maybe a globally unique identifier. An important part of each node is thespecification of input schemas 514 and output schemas 516. The inputschemas 514 identify what partial schemas are required to be populatedbefore the worker process 518 for the node can run. For example, theinitial nodes 500-1, 500-2, and 500-3 specify only the source partialschema 600 as the input schemas 514. Generally, each node 500 generatesone or more output schemas 516 as well, and these outputs can be used asinputs for the worker processes corresponding to other nodes. In someimplementations, each node can also specify one or more parameters 520,which is used by the node's worker process 518 to specify how it runs(e.g., parameters used by a computer vision algorithm).

Because each node 500 specifies both inputs and outputs, it createsnatural dependencies in the process flow graph 124. Because of this, aprocess flow graph 124 is also called a dependency graph. Each arrow inthe process flow graph corresponds to a specific partial schema that iscreated by the node at the tail of the arrow and is used (“consumed”) bythe node at the head of the arrow. As illustrated in FIG. 5A, a singlenode can have multiple input schemas and/or multiple output schemas. Forexample, node A 500-8 has two input schemas 502-5 and 502-6 and node A500-8 also has two output schemas 502-7 and 502-8. Not all partialschemas are used by a subsequent node (e.g., the keyword partialschema), and thus do not appear in the process flow graph 124 becausethey do not create dependencies.

In the illustrated process flow graph 124 in FIG. 5A, each of theinitial nodes has a distinct set of nodes that follow (i.e., there areno nodes that can be reached starting from two different initial nodes).In this case, each worker process 518 is associated with a unique mediafile type 402. In some implementations, however, some of the nodes canbe reached from two or more initial nodes.

Node A 500-8 illustrates several aspects of the process flow graph 124.First, Node A uses two distinct partial schemas 502-5 and 502-6 createdby the first initial node 500-1. One of these input partial schemas502-5 is also used by another node in the process flow graph 124. Node A500-8 also creates two distinct output schemas 502-7 and 502-8, whichare used by two other nodes.

The second initial node 500-2 creates an output partial schema 502-1that is used by both node B 500-4 and node C 500-5. Node B 500-4 usesthe input partial schema 502-1 and creates an output partial schema502-2, which is used by node E 500-7. Note that node B could createother partial schemas as well, such as inserting terms into the keywordpartial schema.

Node C 500-5 uses one input schema 502-1, and creates an output schema502-3, which is used by three other nodes, including node D 500-6 andnode E 500-7. Node D uses a single input schema 502-3, and creates anoutput schema 502-4 that is used by node E 500-7.

As illustrated in FIG. 5A, node E 500-7 uses three distinct inputschemas 502-2, 502-3, and 502-4. Node E 500-7 creates one or more outputschemas, which are not shown.

Because the source partial schema 600 is always created before thetraversal of the graph begins, it does not create any dependencies.Because of this, there are no arrows in the process flow graphcorresponding to the source partial schema 600. For example, node D500-6 could use the source partial schema 600 in addition to the partialschema 502-3 created by node C 500-5.

One example of a worker process is the ImageProcessor, which isresponsible for producing the image schema 604 by reading the sourceimage file and extracting the metadata stored in the file such as theExif or IPTC data stored in JPEG files. Another example of a workerprocess is the FaceProcessor, which uses an image schema and generates aface schema, which can be used by other worker processes, such as facialrecognition.

Implementations provide a configurable set of extensible processingalgorithms that convert binary data into text. In this way, the mediaprocessing engine can be adapted to specific media file sets. Inparticular, users can create new worker processes and new partialschemas, and define which partial schemas each worker process creates oruses. In some implementations, the extensibility is provided as an SDKfor developers.

FIG. 5C illustrates the process of generating searchable content for aset of media files. The process begins by selecting (540) a set of mediafiles to process. In some instances, the files are selected forimportation. In other instances, media files that are already importedare selected for reprocessing or validation.

From the selected set of files, a media file is identified (542) forprocessing. In some implementations, many separate worker threads arerunning, so many media files can be processed in parallel. The multiplethreads may be on the same physical server, and/or on separate physicalservers. Once a media file is identified, a source partial schema iscreated (544) for the identified media file. FIG. 6A illustrates anexample source partial schema 600. Based on the media type of theidentified media file, the appropriate initial worker process begins(546). For example, in FIG. 5A, one of the three initial nodes 500-1,500-2, or 500-3 begins.

Once the initial worker process is complete, the rest of the processflow graph is traversed (548) according to the schema dependencies. Whenthere are multiple worker threads available, two or more processingthreads may be working on the same media file.

In some implementations, during the traversal (548), one or more of theworker processes identifies (550) media files that are embedded in thecurrently processing media file. For example, a worker process that isscanning a PDF file may identify one or more embedded images. As anotherexample, when processing a video, some implementations select a sampleof the video frames and treat the sampled frames as individual images.When embedded media files are identified, the new media files are added(556) to the selected set for processing.

A key aspect of the traversal (548) is to generate searchable content.One way that this is done is to determine keywords. The traversalgenerates (552) a keyword partial schema and inserts the determinedkeywords into this partial schema. Note that two or more distinctprocesses can insert keywords into the keyword partial schema. Forexample, one worker process could determine a keyword by performing OCRon a specific portion of an image, a second worker process coulddetermine keywords that are the name of a person whose face wasrecognized, and a third worker process could identify a city name orother geographical location based on GPS coordinates associated with animage.

In some implementations, the traversal (548) extracts (554) othermetadata and/or media characteristics as well, and saves the data in anappropriate partial schema. For example, some implementations do a coloranalysis of an image to determine a color palette.

When the traversal of the process flow graph 124 is complete, the mediaprocessing engine 132 continues (558) with the next media file.

In some implementations, the media processing also includes a “gather”stage. The gather stage can be used for a media file that was brokeninto smaller pieces (e.g., a PDF broken into individual pages). Thegather phase is invoked after all the pieces (e.g., pages) have beenprocessed (e.g., processed in parallel). The gather phase has access toall of the data computed by the child processing pipelines as well asthe original parent media file. The gather phase can use thisinformation in a number of ways. In some implementations, the gatherphase moves data computed by the child processes into the parent. Forexample, if an image within a PDF document contains a specific type ofgraph, or a signature, the gather phase can store that information inthe parent media file entry for subsequent searching (e.g., a subsequentsearch for PDF documents with a specific signature). In someimplementations, a gather operation is performed for a specific parentmedia file as soon as all of its children (and grandchildren, etc.) areprocessed. In other implementations, there is a single gather phase thatis executed after all of the processing of individual files (e.g.,perform all of the gathering as a batch process).

FIG. 5D shows a pop up window 570 that is displayed during importaccording to some implementations. The window 570 includes a thumbnailimage 572 of the media file being processed, as well as an indicatorgraphic 582 of import progress. In the implementation illustrated, thewindow provides additional information that has been determined aboutthe media file. The window 570 includes a set of keywords 574 that havebeen determined for the media file, a date/time 576 when the media filewas created, a name of the location 578 depicted in the media file(e.g., determined based on GPS coordinates), and a palette 580 of colorsin the image.

FIGS. 6A-6G illustrate some of the common partial schemas used by theworker processes 134 while processing media files. FIG. 6A illustrates asource partial schema 600, which is filled in based on data directlyavailable about the source media file. FIG. 6B illustrates a documentpartial schema 602, which is used for media files that are documents(e.g., a PDF or a word processing document). Note that this partialschema is recursive, because it can include references to embeddedimages and videos. In some implementations, keywords extracted for theembedded images and videos are added to the list of keywords for thedocument itself.

FIG. 6C illustrates a simple image partial schema 604, used for imagefiles. In some implementations, there are sub-schemas that specify datafor specific file formats, such as Exif or IPTC. Some implementationsalso include computed sub-schemas defined by the output of variousprocessing algorithms. For example, a partial schema that includes a setof statistical properties is computed for image schemas (e.g.,containing definitions of common properties, such as histograms orsegmentations).

FIG. 6D illustrates a video partial schema 606 used to store informationabout video files. The video schema 606 is defined for every video fileformat. It contains values that are common to all video formats andoptional sub-schemas that define other video-specific values. In someimplementations, videos are converted to images during processing usingthe processRate option specified in the video schema. In someimplementations, the default processRate is set either for the sourceImport or using global site defaults. The converted images are processedsequentially and the system compresses the results into time-basedpartial schemas that can be reconstructed for any specified time.

FIG. 6E illustrates a proxy partial schema 608, which is used to storelower resolution copies of a media file. It is common to have multipleproxies for a single media file (different resolutions), so proxypartial schemas are usually stored in a list (which is a containerpartial schema). The proxy container schema contains a list of proxyobjects: alternative representations of the source file at lowerresolutions or quality settings. The proxy list is often used duringprocessing to improve the performance of complex analysis operations.For example, a worker process that runs facial detection orconvolutional neural networks can generally run on a lower-resolutionproxy of the original image or video source file.

FIG. 6F illustrates a location partial schema 610, which includesinformation about a location, typically converting from GPS coordinatesto meaningful geographic information, such as city or country. In someimplementations, the location partial schema includes more granularinformation, such as a district within a city or a street name. In someimplementations, the location partial schema 610 includes a businessname or a common name for the site (e.g., a stadium name).

FIG. 6G illustrates a note partial schema 612, which is a generalpurpose schema to store notes about a media file. The note partialschema 612 contains a list of string and drawing notations. In someimplementations, each entry has an associated user, time, andpermission. Drawings are stored using a series of 2D point, line, orpolygon arrays.

In addition to the partial schemas illustrated in FIGS. 6A-6G,implementations typically use several other partial schemas as well. Oneof the additional partial schemas is the keyword partial schema. Thekeyword schema contains some fields that are used for media filesearches. In some implementations, keywords are segmented by confidence,and a separate set of fields are used to store suggestion terms usedduring type-ahead instant search.

The link partial schema manages references between media files. A linkstores a list of dependent media files and a parent media file. Thesefields are used by the processing fabric to re-submit work to the systemfor additional processing. For example, embedded images and videos areextracted from PDF documents as dependent links and frames from a videoare extracted as image files for subsequent processing.

In the partial schema definitions shown in FIGS. 6A-6G, a data type ofthe form List < > indicates that there can be one or more instances ofthe field. A list has a specified order.

Implementations provide a standard set of core schemas, and this set ofschemas can be extended in several ways. First, some implementationsenable a user to add additional data fields to existing schemas. Forexample, a user could add an additional data field to the image partialschema 604 to specify whether each image is in color or black and white.The user specifying the additional fields also specifies the data typesof the additional data fields.

A user can also create entirely new partial schemas, such as the custompartial schema 620 illustrated in FIG. 6H. A custom partial schema 620typically defines a group of related data fields that are unique to aspecific application. For example, for a collection of images for majorleague baseball, each of the images could be assigned one or more teamnames, one or more player names, one or more corporate names whose logosare captured in the images, and so on. This information can be stored inthe data fields of a custom schema. Each data field in a custom schemahas a specified data type, and may store a single value or a list ofvalues (which may be ordered or unordered). In general, the number ofcustom fields in a custom schema is not limited. In the illustratedimplementation, a user has defined a set of r field names field_name_1624-1, field_name_2 624-2, . . . , field_name_r 624-r. In someimplementations, all of the media files within one collection share thesame set of schemas, including the custom schemas. In some of theseimplementations, only the schemas that have corresponding data arestored. In some implementations, various subcollections of media filesshare a same set of schemas, and the sets of schemas can be customizedaccording to each subcollection (e.g., some subcollections addadditional data fields to some of the core schemas and add someadditional partial schemas).

FIG. 7 is a screen shot of a user interface 224 for a media application112. In this screen shot, the user has entered the term “lakeshore” intothe search window 702, and the application 112 has retrieved a set ofsearch results 704, which are thumbnail images of media files that matchthe term “lakeshore.” The media files corresponding to the searchresults 704 were processed by the media processing engine 132 to extractkeywords. The extracted keywords may include the term “lakeshore”literally, or the query engine 330 may match the search term “lakeshore”to other similar keywords, such as “lake.”

Implementations can handle a wide range of media file formats, includingimages, videos, and container documents that have embedded media. Workerprocesses have access to the full source document, and are free toprocess the native data. For example, a worker process can access thefull video source, perform processing that requires access to all of theframes within a video and the native metadata stored with the videofile. Similarly, a worker process for multipage documents (e.g., PDFfiles) can examine the full text of the file and generate summarykeywords or information that improves search and navigation. In someimplementations, a multipage document is broken apart into separatepages, and each page is processed by a separate worker process toidentify summary keywords (and potentially extract embedded imagesand/or video for separate processing.). When all of the individual pageshave been processed, a “gather” worker process combines the results tocreate a list of search terms for the parent document. Running multipleworker processes in parallel can dramatically improve performance, bothbecause of the multiple threads and because searching individual pagesis faster than searching an entire document.

As indicated in FIG. 5C, worker processes can submit additional mediafiles to be processed by the system. This provides a way to break upcomputations into smaller chunks or convert between media formats. Theworker process for PDF files, for example, can submit the images andvideos embedded in the document for processing. After the top-level PDFfile finishes, the system submits any new derived files back to theprocessing pipeline.

Some implementations break down large tasks to improve load balancing. Avideo slice worker process can break up videos into individual images orinto smaller segments (e.g., chunks of a fixed small number of frames orchunks that align with shots). Some implementations use a worker processthat extracts every Nth frame and submits it as a dependent image. Someimplementations just choose a sample frame for processing. Providingthis control in the user-configurable worker processes enables optimizedprocessing.

In some cases, the results of dependent processing can benefit fromcollation to optimize their storage. For example, after processing everyNth frame in a video file, it can be useful to compress their schemas,which are largely duplicated but have minor differences. It may beuseful to store the schemas computed by analyzing a limited number ofindividual frames (e.g., every Nth frame), but then do facial processingon every frame. The number of faces for each range of frames can bestored as metadata for the video. Collation is performed after all ofthe derived media files (e.g., processing of individual video frames asimages) have completed processing. A database search can be used to findall of the derived media files and compress their results.

The foregoing description, for purpose of explanation, has beendescribed with reference to specific implementations. However, theillustrative discussions above are not intended to be exhaustive or tolimit the invention to the precise forms disclosed. Many modificationsand variations are possible in view of the above teachings. Theimplementations were chosen and described in order to best explain theprinciples of the invention and its practical applications, to therebyenable others skilled in the art to best utilize the invention andvarious implementations with various modifications as are suited to theparticular use contemplated.

What is claimed is:
 1. A method of generating searchable content forvisual media files, comprising: at a computing system having one or moreprocessors and memory: defining a set of schemas, wherein the set ofschemas includes a source schema and a keyword schema; defining aplurality of worker processes, wherein each worker process definitionspecifies a respective set of one or more input schemas from the definedset of schemas and each worker process definition specifies a respectiveset of one or more output schemas from the defined set of schemas;building a dependency graph that includes a node for each workerprocess, with dependencies based on the input schemas and output schemasfor each worker process; constructing a respective source schemainstance for a selected visual media file, including filling in fieldsin the source schema instance using information about the selectedvisual media file; traversing nodes in the dependency graph beginningwith a respective initial worker process corresponding to a media typeof the selected visual media file, wherein one or more worker processesexecuted during the traversal inserts search terms into a respectivekeyword schema instance; and storing data from the keyword schemainstance in a database for subsequent searching of visual media files.2. The method of claim 1, further comprising: receiving a search queryfrom a user, wherein the search query comprises a plurality of textualterms; matching the received search query to one or more keyword schemainstances; and returning, to the user, search results that identifyvisual media files corresponding to the matched keyword schemainstances.
 3. The method of claim 1, wherein traversing nodes in thedependency graph comprises: executing a plurality of worker processes,which construct a plurality of additional distinct schema instances; andthe method further comprises storing, in the database, data for theplurality of the additional schema instances.
 4. The method of claim 3,further comprising: receiving a search query from a user, wherein thesearch query comprises a plurality of textual terms; matching thereceived search query to one or more of the stored schema instances; andreturning, to the user, search results that identify visual media filescorresponding to the matched schema instances.
 5. The method of claim 1,wherein: the selected visual media file is a first visual media file ina set of visual media files; and traversing nodes in the dependencygraph for a first visual media file includes executing a first workerprocess that extracts one or more additional visual media files fromwithin the first visual media file and adds the additional visual mediafiles to the set of visual media files.
 6. The method of claim 5,wherein the set of visual media files includes one or more image files.7. The method of claim 5, wherein the set of visual media files includesone or more video files.
 8. The method of claim 5, wherein the set ofvisual media files includes one or more multipage documents.
 9. Acomputer system, comprising: one or more processors; memory; and one ormore programs stored in the memory and configured for execution by theone or more processors, the one or more programs comprising instructionsfor: defining a set of schemas, wherein the set of schemas includes asource schema and a keyword schema; defining a plurality of workerprocesses, wherein each worker process definition specifies a respectiveset of one or more input schemas from the defined set of schemas andeach worker process definition specifies a respective set of one or moreoutput schemas from the defined set of schemas; building a dependencygraph that includes a node for each worker process, with dependenciesbased on the input schemas and output schemas for each worker process;constructing a respective source schema instance for a selected visualmedia file, including filling in fields in the source schema instanceusing information about the selected visual media file; traversing nodesin the dependency graph beginning with a respective initial workerprocess corresponding to a media type of the selected visual media file,wherein one or more worker processes executed during the traversalinserts search terms into a respective keyword schema instance; andstoring data from the keyword schema instance in a database forsubsequent searching of visual media files.
 10. The computer system ofclaim 9, wherein the one or more programs further comprise instructionsfor: receiving a search query from a user, wherein the search querycomprises a plurality of textual terms; matching the received searchquery to one or more keyword schema instances; and returning, to theuser, search results that identify visual media files corresponding tothe matched keyword schema instances.
 11. The computer system of claim9, wherein traversing nodes in the dependency graph comprises: executinga plurality of worker processes, which construct a plurality ofadditional distinct schema instances; and the one or more programsfurther comprise instructions for storing, in the database, data for theplurality of the additional schema instances.
 12. The computer system ofclaim 11, wherein the one or more programs further comprise instructionsfor: receiving a search query from a user, wherein the search querycomprises a plurality of textual terms; matching the received searchquery to one or more of the stored schema instances; and returning, tothe user, search results that identify visual media files correspondingto the matched schema instances.
 13. The computer system of claim 9,wherein: the selected visual media file is a first visual media file ina set of visual media files; and traversing nodes in the dependencygraph for a first visual media file includes executing a first workerprocess that extracts one or more additional visual media files fromwithin the first visual media file and adds the additional visual mediafiles to the set of visual media files.
 14. The computer system of claim9, wherein the set of visual media files includes one or more imagefiles.
 15. The computer system of claim 9, wherein the set of visualmedia files includes one or more video files.
 16. A non-transitorycomputer readable storage medium storing one or more programs configuredfor execution by one or more processors of a computer system, the one ormore programs comprising instructions for: defining a set of schemas,wherein the set of schemas includes a source schema and a keywordschema; defining a plurality of worker processes, wherein each workerprocess definition specifies a respective set of one or more inputschemas from the defined set of schemas and each worker processdefinition specifies a respective set of one or more output schemas fromthe defined set of schemas; building a dependency graph that includes anode for each worker process, with dependencies based on the inputschemas and output schemas for each worker process; constructing arespective source schema instance for a selected visual media file,including filling in fields in the source schema instance usinginformation about the selected visual media file; traversing nodes inthe dependency graph beginning with a respective initial worker processcorresponding to a media type of the selected visual media file, whereinone or more worker processes executed during the traversal insertssearch terms into a respective keyword schema instance; and storing datafrom the keyword schema instance in a database for subsequent searchingof visual media files.
 17. The computer readable storage medium of claim16, wherein the one or more programs further comprise instructions for:receiving a search query from a user, wherein the search query comprisesa plurality of textual terms; matching the received search query to oneor more keyword schema instances; and returning, to the user, searchresults that identify visual media files corresponding to the matchedkeyword schema instances.
 18. The computer readable storage medium ofclaim 16, wherein traversing nodes in the dependency graph comprises:executing a plurality of worker processes, which construct a pluralityof additional distinct schema instances; and the one or more programsfurther comprise instructions for storing, in the database, data for theplurality of the additional schema instances.
 19. The computer readablestorage medium of claim 18, wherein the one or more programs furthercomprise instructions for: receiving a search query from a user, whereinthe search query comprises a plurality of textual terms; matching thereceived search query to one or more of the stored schema instances; andreturning, to the user, search results that identify visual media filescorresponding to the matched schema instances.
 20. The computer readablestorage medium of claim 16, wherein: the selected visual media file is afirst visual media file in a set of visual media files; and traversingnodes in the dependency graph for a first visual media file includesexecuting a first worker process that extracts one or more additionalvisual media files from within the first visual media file and adds theadditional visual media files to the set of visual media files.